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The computer revolution has been driven by a sustained increase of com- 
putational speed of approximately one order of magnitude (a factor of ten) 
every five years since about 1950. In natural sciences this has led to a con- 
tinuous increase of the importance of computer simulations. Major enabling 
techniques are Markov Chain Monte Carlo (MCMC) and Molecular Dynamics 
(MD) simulations. 

This article deals with the MCMC approach. First basic simulation tech- 
niques, as well as methods for their statistical analysis are reviewed. After- 
wards the focus is on generalized ensembles and biased updating, two ad- 
vanced techniques, which are of relevance for simulations of biomolecules, or 
are expected to become relevant with that respect. In particular we consider 
the multicanonical ensemble and the replica exchange method (also known as 
parallel tempering or method of multiple Markov chains). 

1 Introduction 

Markov chain Monte Carlo (MCMC) calculations started in earnest with the 
1953 paper by Nicholas Metropolis, Arianna Rosenbluth, Marshall Rosen- 
bluth, Augusta Teller and Edward Teller [1J. Since then MCMC simulations 
have become an indispensable tool with applications in many branches of sci- 
ence. Some of those are reviewed in the proceedings [2] of the 2003 Los Alamos 
conference, which celebrated the 50th birthday of Metropolis simulations. 

The purpose of this article is to give a concise overview ranging from 
statistical preliminaries and plain Monte Carlo (MC) calculations over basic 
techniques for MCMC simulations to advanced methods, which are indispens- 
able when it comes to the simulations of complex systems with frustrated 
interactions. For such systems rugged free energy landscapes are typical. Here 
our focus is on biomolecules such as peptides (small proteins) in an all-atom 
approach, defined by a model energy function. At a given temperature this 
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energy function determines in principle the Gibbs ensemble of the molecule. 
In practice equilibrium is sometimes hard to reach in MCMC simulations of 
the canonical ensemble. Considerable improvements can be made by using 
generalized ensembles and biased sampling. 

The first part of this article gives a treatment of the MCMC fundamentals 
that is largely based on the author's book [3] on the subject, which gives many 
more details and contains extensive additional material. The book comes with 
computer code that can be downloaded from the web. The solutions of nu- 
merous numerical assignments, are reproducible by compiling and running the 
corresponding computer programs. Informations and a link to the computer 
code are found on the web at http://www.scs.fsu.edu/~berg. 

The second part of this article builds on original literature of generalized 
ensemble methods [H EJ EJ H El El [TUl HH H2] . We start with a brief history 
and elaborate on the underlying ideas. Subsequently we turn to biophysics, 
where generalized ensembles were introduced in Ref. [13U14] . Finally a scheme 
for biasing the Metropolis updating proposals [15J is considered, which can be 
combined with generalized ensembles. 

As our emphasize is on explaining MCMC methods, we restrict ourselves 
to a simply models, which are well suited for illustrating the essence of a 
method. Our article is organized as follows. The next section introduces the 
MCMC method. Section [3] discusses the statistical analysis of autocorrelated 
MCMC data. Section [4] deals with generalized ensembles and section [5] with 
biased MCMC updating. A short outlook and conclusions are given in the 
final section [6] It should be noted that these lecture notes are not supposed to 
be an unbiased review, but are based on work in which the author has been 
involved our is particularly interested. 



2 Markov Chain Monte Carlo 
2.1 Statistical preliminaries 

Let f(x) be a probability density and x r its associated random variable. The 
(cumulative) distribution function of the random variable x r is defined 
as 

F{x) = P(x r < x) = f f(x)dx (1) 

J — oo 

where P(x r < x) is the probability for x r < x. A particularly simple and im- 
portant case is the uniform probability distribution for random numbers 
between [0, 1), 

u(x) = l 1 for °^ x<1 ' (2) 
v ; 10 elsewhere. v ; 

Remarkably, the uniform distribution allows for the construction of general 
probability distributions. Let 
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y = F(x) = / f(x')dx' 

and assume that the inverse x — F^ 1 {y) exists. For y r being a uniformly 
distributed random variable in the range [0, 1) it follows that 

x r = F-\y r ) (3) 

is distributed according to the probability density f(x). To generate the uni- 
form distribution on a computer, one relies on pseudo random number genera- 
tors. Desirable properties are randomness according to statistical tests, a long 
period, computational efficiency, repeatability, portability, and homogeneity 
(all subsets of bits are random). Our purposes are served well by the generator 
of Marsaglia and collaborators [21] , which comes as part of the code of [3] . 

2.2 Partition function and Potts models 

MC simulations of systems described by the Gibbs canonical ensemble aim 
at calculating estimators of physical observables at a temperature T. In the 
following we choose units so that the Boltzmann constant becomes one, i.e. 
= 1/T . Let us consider the calculation of the expectation value of an ob- 
servable O. Mathematically all systems on a computer are discrete, because 
a finite word length has to be used. Hence, the expectation value is given by 
the sum 

K 

6 = 6{p) = (O) = Z- l Y,0 [k) e-^ Eik) (4) 
fe=i 

K 

where Z = Z{p) = ^ e~ p eW (5) 
fe=i 

is the partition function. The index k = 1, . . . , K labels the configura- 
tions of the system, and is the (internal) energy of configuration k. The 
configurations are also called microstates. To distinguish the configuration 
index from other indices, it is put in parenthesis. 

We introduce generalized Potts models on <i-dimcnsional hypercubic lat- 
tices with periodic boundary conditions (i.e., the models are defined on a torus 
in d dimensions). Without being overly complicated, these models are general 
enough to illustrate the essential features of MCMC simulations. Various sub- 
cases are by themselves of physical interest. We define the energy function of 
the system by 

w»e re ,(„,*,_{ ^ (6, 

The sum (ij) is over the nearest neighbor lattice sites and is called the 
Potts spin or Potts state of configuration k at site i. For the g-state Potts 
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(k) 

model, q\ ' takes on the values l,...,q. The case q = 2 becomes equivalent to 
the Ising ferromagnet. See F.Y. Wu |TB] for a review of Potts models. In d = 2 
dimensions the phase transition is second order for q < 4 and first order for 
q > 5. The exact infinite volume latent heats Ae s and entropy discontinuities 
As were calculated by Baxter [17], while the interface tensions f s were derived 
later, see [18] and references therein. 

2.3 Sampling, reweighting, and important configurations 

For the Ising model (2-state Potts) it is straightforward to sample statisti- 
cally independent configurations. We simply have to generate N spins, 
each either or 1 with 50% likelihood. This is called random sampling. In 
Fig. [1] a thus obtained histogram for the 2d Ising model energy per spin is 
depicted. 



Random Sampling 
Re-weighted to p=0.2 
MC at p=0-2 
MC at p=0-4 




-1.5 



Fig. 1. Energy histograms of 100 000 entries each for the Ising model on a 20 x 20 
lattice: Random Sampling gives statistically independent configurations at j3 — 0. 
Histograms at /3 = 0.2 and /3 = 0.4 are generated with Markov chain MC. Reweight- 
ing of the /3 = random configurations to /3 = 0.2 is shown to fail. These are 
assignments a0301_02 and a0303_02 of [3]. 



Note that it is important to distinguish the energy measurements on single 
configurations from the expectation value. The expectation value e s is a single 
number, while e s fluctuates. From the measurement of many e s values one 
finds an estimator of the mean, e s , which fluctuates with a reduced variance. 

The histogram entries at j3 = can be reweighted so that they corre- 
spond to other (3 values. We simply have to multiply the entry corresponding 
to energy E by exp{—(3E). Similarly histograms corresponding to the Gibbs 
ensemble at some value /3q can be reweighted to other (3 values. Care has to 
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be taken to ensure that the involved arguments of the exponential function 
do not become too large [2j ■ Reweighting has a long history, which we discuss 
in section I4TT1 

In Fig. [1] reweighting is done from (3q = to (3 = 0.2. But, by comparison 
to the histogram from a Metropolis MC calculation at (3 = 0.2, the result is 
seen to be disastrous. The reason is easily identified: In the range where the 
(3 = 0.2 histogram takes on its maximum, the (3 — histogram has not a single 
entry. Our random sampling procedure misses the important configurations 
at (3 = 0.2. Reweighting to new (3 values works only in a range (3q ± A/3, where 
A/3 — > in the infinite volume limit. Details are given in section [4~TI 

Let us determine the important contributions to the partition function. 
The partition function can be re-written as a sum over energies 

Z = Z(f3)=J2n(E)e^ E (7) 

E 

where the unnormalizcd spectral density n{E) is defined as the number of 
microstates k with energy E. For a fixed value of (3 the energy probability 
density 

Pp{E)=Cf ) n(E)e-f ,E (8) 

is peaked around the average value E((3), where is a normalization constant 
determined by ^2 E Pp(E) = 1. The important configurations at temperature 
T = 1/(3 are at the energy values for which the probability density Pp(E) is 
large. To sample them efficiently, one needs a procedure which generates the 
configurations with their Boltzmann weights 

= e^ E(k> . (9) 

The number of configurations n(E) and the weights combine then so that the 
probability to generate a configuration at energy E becomes Pp{E) as given 
by equation 

2.4 Importance sampling and Markov chain Monte Carlo 

For the canonical ensemble importance sampling generates configurations 
k with probability 

P^= CBW ^= CB e^ E(k) (10) 

where the constant cb is determined by the normalization condition P^ = 

1. The vector (P B ) is called Boltzmann state. When configurations are 

stochastically generated with probability P B k ^ , the expectation value becomes 
the arithmetic average: 

, N K 

8 = 6{0) = (O)= lim -^0 (t ">. (11) 

n—1 



6 Bernd A. Berg 



Truncating the sum at some finite value of Nk, we obtain an estimator of 
the expectation value 

_ , N K 

= — VO' 1 '"'. (12) 
N K ^ 

71=1 

Normally, we cannot generate configurations k directly with the probabil- 
ity (|10[) . but they may be found as members of the equilibrium distribution of 
a dynamic process. A Markov process is a particularly simple dynamic pro- 
cess, which generates configuration k n+ \ stochastically from configuration k n , 
so that no information about previous configurations k n -\, k n _2, ■ ■ ■ is needed. 
The elements of the Markov process time series are the configurations. As- 
sume that the configuration k is given. Let the transition probability to create 
the configuration / in one step from k be given by W^(. k ) = W[k — > I]. The 
transition matrix 

defines the Markov process. Note, that this matrix is a very big (never stored 
in the computer), because its labels are the configurations. To generate con- 
figurations with the desired probabilities, the matrix W needs to satisfy the 
following properties: 

(i) Ergodicity: 

e-P Eik) > and e"^' > imply : (14) 

an integer number n > exists so that (W n ) W( k ) > holds. 

(ii) Normalization: 

£VW(*) = i. (15) 
i 

(iii) Balance: 

^ W (0(*) e -^ w = e-^ l) . (16) 

k 

The Boltzmann state (flQ|) is an eigenvector with eigenvalue 1 of the matrix 
W = (W (l) ( k )). 

An ensemble is a collection of configurations for which to each configura- 
tion k a probability pW is assig ned, J2k p(k) = L The Gibbs or Boltzmann 
ensemble Eb is defined to be the ensemble with the probability distribu- 
tion (fT0|) . 

An equilibrium ensemble E eq of the Markov process is defined by its 
probability distribution P eq satisfying 



WP eq = P eq , in components P^ = ^ lyWWpW . (17) 

k 
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Statement: Under the conditions (i), (ii) and (iii) the Boltzmann ensem- 
ble is the only equilibrium ensemble of the Markov process and an attractive 
fixed point. Applying the transition matrix n times give rise to an ensem- 
ble E n . For n — > oo the distance between E n and the Boltzmann ensemble 
decreases asymptotically like 

\\E n -E B \\<eM~*ri)\\E -E B \\ (18) 

where E° is the initial ensemble and A > a constant. 

For a proof the readers are referred to [3]. There are many ways to con- 
struct a Markov process satisfying (i), (ii) and (iii). A stronger condition than 
balance (fl"6|) is 

(iii') Detailed balance: 

w (l)(k) e -pEM = w (k)m e -pEm _ ^ 

Using the normalization J2k W^ k '^ = 1 detailed balance implies balance. 

At this point we have succeeded to replace the canonical ensemble average 
by a time average over an artificial dynamics. Calculating averages over large 
times, like one does in real experiments, is equivalent to calculating averages 
of the ensemble. 



2.5 Metropolis and heatbath algorithm for Potts models 

The Metropolis algorithm can be used whenever one knows how to cal- 
culate the energy of a configuration. Given a configuration k, the Metropolis 
algorithm proposes a configuration I with probability 

/(/, k) normalized to ^ /(/, k) = 1 . (20) 

l 

The new configuration I is accepted with probability 



io = min 



1. 



P. 



(0 



B 



P 



(k) 
B J 



/ 1 for £(') < fiW ,„ n 
1 eM-P(E {l) - E^)] for E^ > . 1 ' 



If the new configuration is rejected, the old configuration has to be counted 
again. The acceptance rate is defined as the ratio of accepted changes over 
proposed moves. With this convention we do not count a move as accepted 
when it proposes the at hand configuration. 

The Metropolis procedure gives rise to the transition probabilities 

W®W = f(l,k)w^ for l^k (22) 
and TU (fe)(fe) = /(fc,/c) + ^/(;,fc)(l-u; ( ' )(fc) ) . (23) 

l^k 
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Therefore, the ratio (W^OM /ffWW) satisfies detailed balance dHJ) if 

/(Z,fc) = f(k,l) holds. (24) 

Otherwise the probability density f(l, k) is unconstrained. So there is an amaz- 
ing flexibility in the choice of the transition probabilities WWW, Also, the 
algorithm generalizes immediately to arbitrary weights. 

If sites are chosen with the uniform probability distribution 1 /N per site, 
where N is the total number of spins, it is obvious that the algorithm fulfills 
detailed balance. It is noteworthy that the procedure remains valid when the 
spins are chosen in the systematic order 1,...,N. Balance (|16l) still holds, 
whereas detailed balance fl9|) is violated (an exercise of Ref. [3]). 

If one performs multiple hits with the Metropolis algorithm at the same 
spin (multi-hit Metropolis algorithm), the local Boltzmann distribution 
defined by its nearest neighbors is approached for an increasing number of 
hits. The heatbath algorithm (HBA) chooses a state directly with the 
local Boltzmann distribution defined by its nearest neighbors. The state qi can 
take on one of the values 1, . . . , q and, with all other states set, determines 
a value of the energy function (J6j) . We denote this energy by E(q{) and the 
Boltzmann probabilities are 

P B {qi) = const e-^M (25) 

where the constant is determined by the normalization condition 

<? 

E P b^) = 1 ■ ( 26 ) 
?i=i 

In equation (|25p we can define E(qi) to be just the contribution of the in- 
teraction of qi with its nearest neighbors to the total energy and absorb the 
other contributions into the overall constant. Here we give a generic HBA 
which works for arbitrary values of q and d (other solutions can be more 
efficient). We calculate the cumulative distribution function of the heatbath 
probabilities 

qi 

P HB (qi) = £ PsWi) ■ (27) 
i'i= l 

The normalization condition (j2"o| implies PnBiq) — 1< Comparison of these 
cumulative probabilities with a uniform random number x r yields the heat- 
bath update qi — ► qt. Note that in the heatbath procedure the original value 
q\ n does not influence the selection of qf ew - 

2.6 Heatbath algorithm for a continuous system 

We give the 0(3) a model as an example of a model with a continuous en- 
ergy function. Expectation values are calculated with respect to the partition 
function 
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Z= Jjldsie-WW* of spins Si = ^s- 2 j (28) 

which are normalized to lie on the unit sphere, («i) 2 = 1. The measure dsi is 
defined by 

ds * = ^ J dcos(6i) J dfa , (29) 

where the polar (9i) and azimuth (0j) angles define the spin Si on the unit 
sphere. The energy is 

E = -5> Sj , (30) 

m 

where the sum goes over the nearest neighbor sites of the lattice and SjSj is the 
dot product of the vectors. The 2d version of the model is of interest to field 
theorists because of its analogies with the four-dimensional Yang-Mills theory. 
In statistical physics the d-dimensional model is known as the Heisenberg 
ferromagnet (references can be found in [3]). 

We would like to update a single spin s. The sum of its 2d neighbors is 

S = 81 + S 2 + . . . + S2d-1 + S 2 d ■ 

Hence, the contribution of spin s to the energy is 2d — sS. We may propose 
a new spin s with the measure (|29|) by drawing two uniformly distributed 
random numbers 

4> r G [0, 2tt) for the azimuth angle and 
cos(6> r ) = x r G [—1, +1) for the cosine of the polar angle . 

This defines the probability function f(s , s) of the Metropolis process, which 
accepts the proposed spin s with probability 



w(s — > s ) 



1 for Ss > Ss, 

exp[-/3(Ss- Ss')} for Ss' <Ss. 



One would prefer to choose s directly with the probability 

W(s^s') = P(s';S) = const e p s ' s . 

The HBA creates this distribution. Implementation of it becomes feasible 
when the energy function allows for an explicit calculation of the probability 
P(s ; S). This is an easy task for the 0(3) cr-model. Let 

a = angle(s , S), x = cos(a) and S = (3\S\ . 

For S = a new spin s is simply obtained by random sampling. We assume 
in the following S > 0. The Boltzmann weight becomes exp(a;S') and the 
normalization constant follows from 
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J dx e xS = - sinh(S) 



Therefore, the desired probability is 

and Eq. ([3]) can be used to generate events with the probability density f{x). 
A uniformly distributed random number y r 6 [0,1) translates into 

x r = cos a'' = — ln[exp(+5) - y r cxp(+5) + y r exp(-S)] . (31) 

To give s a direction in the plane orthogonal to S, one chooses a uniformly 
distributed angle f3 r in the range < (3 r < 2tt. Then, x r — cosa r and (3 r 
completely determine s with respect to S. Before storing s in the computer 
memory, we have to calculate coordinates of s with respect to a Cartesian 
coordinate system, which is globally used for all spins of the lattice. This 
amounts to a linear transformation. 



3 Statistical Errors of MCMC Data 

In large scale MC simulation it may take months, possibly years, to collect 
the necessary statistics. For such data a thorough error analysis is a must. A 
typical MC simulation falls into two parts: 

1. Equilibration: Initial sweeps are performed to reach the equilibrium dis- 
tribution. During these sweeps measurements are either not taken at all 
or they have to be discarded when calculating equilibrium expectation 
values. 

2. Data Production: Sweeps with measurements are performed. Equilib- 
rium expectation values are calculated from this statistics. 

A rule of thumb is: Do not spend more than 50% of your CPU time 
on measurements. The reason for this rule is that one cannot be off by a 
factor worse than two (y/2 in the statistical error). 

How many sweeps should be discarded for reaching equilibrium? In some 
situations this question can be rigorously answered with the Coupling from 
the Past method [19] (for a review see 20J). The next best thing to do is 
to measure the integrated autocorrelation time and to discard, after reaching 
a visually satisfactory situation, a number of sweeps which is larger than 
the integrated autocorrelation time. In practice even this can often not be 
achieved. 

Therefore, it is re-assuring that it is sufficient to pick the number of dis- 
carded sweeps approximately right. With increasing statistics the contribution 



Markov-Chain Monte Carlo Methods for Simulations of Biomolecules 11 

of the non-equilibrium data dies out like 1/iV, where N is the number of mea- 
surements. This is eventually swallowed by the statistical error, which declines 
only like 1 / y/N. The point of discarding the equilibrium configurations is that 
the factor in front of l/N can be large. 

There can be far more involved situations, like that the Markov chain ends 
up in a metastable configuration, which may even stay unnoticed (this tends 
to happen in complex systems like spin glasses or proteins). 

3.1 Autocorrelations 

We like to estimate the expectation value / of some physical observable. We 
assume that the system has reached equilibrium. How many MC sweeps are 
needed to estimate / with some desired accuracy? To answer this question, 
one has to understand the autocorrelations within the Markov chain. 
Given is a time series of N measurements from a Markov process 

fi = f(xi), i = l,...,N, (32) 

where the configurations generated. The label i = l,...,N runs in 

the temporal order of the Markov chain and the elapsed time (measured in 
updates or sweeps) between subsequent measurements /j+i is always the 
same. The estimator of the expectation value / is 

7=^£/<" (33) 

With the notation 

t = \i-j\ 

the definition of the autocorrelation function of the observable / is 
C{t) = Ca =({fi- (/,•» (fj - (/,•)) ) = (fifi) - (fi) (fj) = (faft) - P (34) 

where we used that translation invariance in time holds for the equilibrium 
ensemble. The asymptotic behavior for large t is 

C(t)~exp( — ) for t — ► oo , (35) 

V T cxp / 

where r oxp is called exponential autocorrelation time and is related to 
the second largest eigenvalue Ai of the transition matrix by T cxp = — In Ai 
under the assumption that / has a non-zero projection on the corresponding 
eigenstate. Superselection rules are possible so that different autocorrelation 
times reign for different operators. 

The variance of / is a special case of the autocorrelations (|3"4"]l 



C(0) = a 2 (/) 



(36) 
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Some algebra [3] shows that the variance of the estimator / (|33|) for the mean 
and the autocorrelation function (|34|) are related by 



- 2 (7) 



N 



N-l 



2 E('-s)^) 



with c(t) 



C[t) 
C(0) 



(37) 



This equation ought to be compared with the corresponding equation for 
uncorrelated random variables: cr 2 (/) = a 2 (f)/N. The difference is the factor 
in the bracket of (f37|) . which defines the integrated autocorrelation time 



Tint 



N-l 



(38) 



For correlated data the variance of the mean is by the factor Tj nt larger than 
the corresponding variance for uncorrelated data. In most simulations one is 
interested in the limit N — > oo and equation (f3"8"|) becomes 



Tint = 1 



OO 

t=i 



(39) 



The numerical estimation of the integrated autocorrelation time faces diffi- 
culties. The variance of the estimator for (|39|) diverges, because for large t 
each c(t) adds a constant amount of noise, whereas the signal dies out like 
exp(— t/r CX p). To obtain an estimate one considers the t-dependent estimator 



Tint(t) = l + 2^c(t') 
t'=i 

and looks out for a window in t for which T ni t(£) is flat. 



(40) 



3.2 Integrated autocorrelation time and binning 



Using binning (also called blocking) the integrated autocorrelation time can 
also be estimated via the variance ratio. We bin the time series (|3"2"]l into 
N bs < N bins of 



N h = NBIN = 



" N ' 




' NDAT " 






NBINS 



(41) 



data each. Here [.] stands for Fortran integer division, i.e., Nb = NBIN is the 
largest integer < N/Nb s , implying Nb s ■ Nb < N. It is convenient to choose 
the values of N and Nb s so that N is a multiple of Nb s ■ The binned data are 
the averages 



pN,, 



1 

N~b 



jN b 

E fi 

=l+(j-l)JV 6 



for j = 1, 



, N 



hs 



(42) 
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For TVft > r cxp the autocorrelations are essentially reduced to those between 
nearest neighbor bins and even these approach zero under further increase of 
the binsize. 

For a set of Nb s binned data /. b , (j — 1, . . . , Nj, s ) we may calculate the 
mean with its naive error bar. Assuming for the moment an infinite time series, 
we find the integrated autocorrelation time ([38]) from the following ratio of 
sample variances 

Tint = A^oo ^ With ^ = • (43) 

In practice the Nf, — > oo limit will be reached for a sufficiently large, finite 
value of Ni,. The statistical error of the Tint estimate (l43l) is, in the first 
approximation, determined by the errors of sz Nb ■ The typical situation is then 

that, due to the central limit theorem, the binned data are approximately 
Gaussian, so that the error of sl Nb is analytically known from the x 2 

distribution. Finally, the fluctuations of Sj of the denominator give rise to a 

small correction which can be worked out [3] . 

Numerically most accurate estimates of Tj nt are obtained for the finite 
binsize iVf, which is just large enough that the binned data (j4"2")l are practi- 
cally uncorrelated. While the Student distribution shows that the confidence 
intervals of the error bars from 16 uncorrelated normal data are reasonable 
approximations to those of the Gaussian standard deviation, about 1000 in- 
dependent data are needed to provide a decent estimate of the corresponding 
variance (at the 95% confidence level with an accuracy of slightly better than 
10%). It makes sense to work with error bars from 16 binned data, but the 
error of the error bar, and hence a reliable estimate of Tint, requires far more 
data. 



3.3 Comparison of MCMC algorithms 

Figure [2] illustrates 2d Ising model simulations on the critical point of its 
second order phase transition, j3 = (3 C = ln(l + y/2)/2. Critical slowing 
down is observed: An increase Tint ~ L z with lattice size, where z w 2.17 is 
the dynamical critical exponent of the 2d Ising model. Estimates of z are 
compiled in |22j . Using another MC dynamics the critical slowing down can 
be overcome by cluster updating [23l [24] . 

Figure [3] exhibits the improvements of heat bath over Metropolis 
updating for the 10-state d — 2 Potts model at (3 — 0.62. For this first 
order phase transition there is practically no lattice size dependence of the 
integrated autocorrelation time, once the lattices are large enough. We see 
that the 2-hist Metropolis updating reduces Tint by about a factor of two and 
the heatbath updating reduces it by about a factor of five. 
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Fig. 2. One-hit Metropolis algorithm with sequential updating: Lattice size depen- 
dence of the integrated autocorrelation time for the d — 2 Ising model at its critical 
temperature (assignment a0402_02 D of [5]). The ordering of the curves is identical 
with the ordering of the labels in the figure. 




L=40 1 -hit Metropolis • — 1 — • 

L=80 1 -hit Metropolis - 

L=40 2-hit Metropolis 

L=80 2-hit Metropolis 
L=40 heat bath 

L=80 heat bath - • - 

H--H--H--H--H-i8-!9 ■«■■•»■»■ -6&- 8- ■» -SB ■« B-S-B S 8 m-S-W -B- & " 



100 



150 



200 



250 



Fig. 3. Systematic updating: Comparison of the integrated autocorrelation times 
of the 1-hit and 2-hit Metropolis algorithms and the heat bath algorithm for the 
10-state Potts model on L x L lattices at /3 = 0.62 (assignment a0402_06). The 
L — 40 and L — 80 curves lie almost on top of one another. 
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3.4 Jackknife estimators 

Often one wants to estimate a non-linear function of the mean x of some 
statistical variables / = f(x) where the estimator of x and / are 



1 - 

w =nY, x *> / = /(*)• ( 44 ) 



N 

i=l 

Typically, the bias is of order 1/N: 

bias (/) = J -(f) = ^ + ^ + 0(^) (45) 

where a\ and ai are constants. But for the biased estimator we lost the ability 
to estimate the variance a 2 (f ) — a 2 (f)/N via the standard equation 

1 1 N 

m = „Af) = (46) 

because fa = f(xi) is not a valid estimator of /. The error bar problem for 

the estimator / is conveniently overcome by using jackknife estimators / , 
ff, defined by 

_ 1 N 1 

f J = N^ f * With f * = f{x * ] and ^ = JV^lS 1 * ' (4?) 

i=l k=£i 

The estimator for the variance cr 2 (/ J ) is 

AT - 1 N 

4(7 J ) = (48) 

1=1 

Straightforward algebra shows that in the unbiased case the estimator of the 
jackknife variance (|48|) reduces to the normal variance (|46|) . Notably, only 
order N (not TV 2 ) operations are needed to construct the jackknife averages 
xf , i = 1, . . . , N from the orginal data. 

The jackknife method was introduced in the 1950s [551 US]- F° r a review 
see [3]). It is recommended as the standard for error bar calculations of biased 
estimators. 



3.5 Self-consistent versus reasonable error analysis 

By visual inspection of the time series, one may get an impression about the 
length of the out-of-equilibrium part of the simulation. On top of this one 
should still choose 
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nequi > r int , (49) 

to allow the system to settle. That is a first reason, why it appears necessary 
to control the integrated autocorrelation time of a MC simulations. A second 
reason is that we have to control the error bars of the equilibrium part of our 
simulation. Ideally the error bars are calculated as 

A/ = ^Q) with a 2 (J) = r int ^ . (50) 

This constitutes a self-consistent error analysis of a MC simulation. 

However, the calculation of the integrated autocorrelation time may be 
out of reach. Many more than the about twenty independent data are needed, 
which according to the Student distribution are sufficient to estimate mean 
values with reasonably reliable error bars. 

In practice, one has to be content with what can be done. Often this 
means to rely on the binning method. We simply calculate error bars of 
our ever increasing statistics with respect to a fixed number of 

NBINS > 16 . (51) 

In addition, we may put 10% of the initially planned simulation time away 
for reaching equilibrium. A-posterioH, this can always be increased. Once the 
statistics is large enough, our small number of binned data become effectively 
independent and our error analysis is justified. 

How do we know that the statistics has become large enough? In practical 
applications there can be indirect arguments, like FSS estimates, which tell 
us that the integrated autocorrelation time is in fact (much) smaller than the 
achieved bin length. This is no longer self-consistent, as we perform no explicit 
measurement of Tint, but it is a reasonable error analysis. 

4 Generalized Ensembles for MCMC Simulations 

The MCMC schemes discussed so far simulate the Gibbs canonical ensemble. 
Mean values of physical observables at the temperature chosen are obtained 
as arithmetic averages of the measurements. However, in statistical physics 
one would like to control the partition function, which allows to calculate 
observables at all temperatures and for the the proper normalization of the 
entropy and free energy. Also configurations, which are rare in the canonical, 
but well represented in another ensemble can be of physical interest. Finally 
the efficiency of the Markov process, i.e., the computer time needed for the 
convergence of an estimator of a physical quantity to a desired accuracy can 
depend greatly on the ensemble in which the simulations are performed. 
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4.1 Reweighting of the canonical ensemble 

A first attempt to calculate the partition function by MCMC simulations dates 
back to a 1959 paper by Salsburg et al. [27]. As was already noticed by the 
authors their method is restricted to very small lattices. The reason is that 
their approach relies on what one calls in the modern language reweighting of 
the Gibbs canonical ensemble. It extrapolates data from a canonical simulation 
at one temperature to the canonical ensemble at another temperature. 

The reweighting method has a long history. McDonald and Singer [28] 
were the first to use the equations of [57] to evaluate physical quantities over 
a range of temperatures. But thereafter the method was essentially forgotten 
and a recovery in lattice gauge theory [35J [30] focused on calculating complex 
zeros of the partition function. It remained to the paper by Ferrenberg and 
Swendsen 31J, to formulate crystal-clear for what the method is particularly 
good, and for what not: It allows to focus on peaks of appropriate observables 
in the thermodynamic scaling limit, but it does not allow to cover a finite tem- 
perature range in the infinite volume limit. Off critical points, the reweighting 
range A/3 in /3 — l/(kT) decreases like A/3 ~ 1/yN, where N is the number 
of degrees of freedom, which parametrizes the size of the system (e.g., the 
number of atoms). This follows from the fact that the energy is an extensive 
thermodynamic quantity, E ~ N with fluctuations ~ >/~N. As (3 multiplies 
the energy, the change stays within the fluctuations as long as A/3 N ~ yN, 
so that A/3 ~ 1/VN follows. 

At second order phase transitions the reweighting range actually increases, 
because critical fluctuations are larger than non-critical fluctuations. One has 
then AE ~ N x with 1/2 < x < 1 and the requirement A(3N ~ N x yields 
A/3 ~ For first order phase transitions one has a latent heat AE ~ N, 

but this does not mean that the reweighting range becomes of order one. In 
essence, the fluctuations collapse, because the two phases become separated 
by an interface. One is back to fluctuations within either of the two phases 
where A/3 - 1/y/N holds. 

To estimate the partition function over a finite range Ae in the energy 
density e = E/N, i.e., AE ~ N, one can patch the histograms from canon- 
ical simulations at several temperatures. Such multi-histogram methods 
have also a long tradition too. In 1972 Valleau and Card [32] proposed the 
use of overlapping bridging distributions and called their method "multistage 
sampling" . Free energy and entropy calculations become possible when one 
can link the temperature region of interest with a point in configuration space 
for which exact values of these quantities are known. Ref. [31] stimulated a 
renaissance of this approach. Modern work [331 1311 133 developed efficient 
techniques to combine the overlapping distributions into one estimate of the 
spectral density and to control the statistical errors of the estimate. How- 
ever, the patching of histograms from canonical simulations faces a number 
of limitations: 
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1 . The number of canonical simulations diverges like y/N when one wants to 
cover a finite, non-critical range of the energy density. 

2. At first order phase transition point, the canonical probability of configu- 
ration with an interface decreases ~ exp(— f s A). Here f s is the interfacial 
surface tension and A the minimal area of an interface, which divides the 
system into subsystems of distinct phases. For a system of volume L d the 
area A diverges ~ L^ 1 in the infinite volume limit L — > oo. 

4.2 Generalized ensembles 

One can cope with the difficulties of multi-histogram methods by allowing ar- 
bitrary sampling distributions instead of just the Gibbs-Boltzmann ensemble. 
This was first recognized by Torrie and Valleau [4] when they introduced um- 
brella sampling. However, for the next thirteen years application of this idea 
remained mainly confined to physical chemistry. That there is a potentially 
very broad range of applications for the basic idea remained unrecognized. A 
major barrier, which prevented researchers from trying such extensions, was 
certainly the apparent lack of direct and straightforward ways of determining 
suitable weighting functions for problems at hand. In the words of Li and 
Scheraga [36j : The difficulty of finding such weighting factors has prevented 
wide applications of the umbrella sampling method to many physical systems. 

The turn-around came with the introduction of the multicanonical en- 
semble [7\ [H [9] . These papers focused on one well-defined weight function, 
up to a normalization constant the inverse spectral density, 

w muca {E) ~ l/n{E) for E min <E< E max , (52) 

where n(E) is the number of states, and offered a variety of methods to 
find a "working approximation" of the weight function. Here a working 
approximation is defined as being accurate enough, so that the desired energy 
range will indeed be covered after the weight factors are fixed. A typical 
multicanonical simulation consists then of three parts: 

1. Construct a working approximation of the weight function w; muca . 

2. Perform a conventional MCMC simulation with these weight factors. 

3. Reweight the data to the Gibbs-Boltzmann ensemble at desired tempera- 
tures to obtain estimates of canonical expectation values for observables 
of physical of interest. 

For instance, for the statistical physics system considered in [8], the 2d 
10-state Potts model, finite size scaling consideration allow to construct the 
weight function on a larger lattice in one step from the information about the 
spectral density on the already simulated smaller lattices. This is a solution to 
step (1) in this particular case. The simulation and data analysis (see [3] for 
details) is then rather straightforward. Still, such conceptual simplifications 
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might have changed little on the situation that non-canonical ensembles were 
rarely used, if there were not other favorable circumstance. 

One was that the paper by [5] estimated the interfacial tension of the 
2d 10-state Potts model and produced a result, which was an order of mag- 
nitude smaller than previous numerical estimates by renown computational 
scientists. Normally this would have been followed by an extended debate of 
the advantages and disadvantages of the two competing messages. However, 
shortly after publication of the numerical estimates it turned out that the 
interfacial tension of the 2d 10-state Potts models can be calculated analyt- 
ically 18J, and the multicanonical estimate was within 3% of the rigorous 
result. This attracted attention and gave a lot of researchers confidence in the 
method. 

Another phenomenon was, that at 1991 ± 5 years a number of papers 
[5J O [TTJ [TU1 [H] emerged in the literature, which all aimed at improving 
MCMC calculations by extending the confines of the canonical ensemble. Most 
important has been the replica exchange method, which is also known 
under the names parallel tempering and multiple Markov chains. In the 
context of spin glass simulations an exchange of partial lattice configurations 
at different temperatures was proposed by Swendsen and Wang [5j. But it 
was only in the later works [5j [H] , essentially by rediscovery, recognized that 
the special case for which entire configurations at distinct temperatures are 
exchanged is of utmost importance. 

In the successful replica exchange method one performs n canonical MC 
simulations at different /3-values with Boltzmann weight factors 

Wb^E^) = e-^ Eik) = e- H , i = 0, . . . , n - 1 (53) 
where (3q < (3\ < ... < /3 n _2 < Pn-i, and exchanges neighboring f3- values 

A-i < — ► A for i = 1, . . . , n - 1 . (54) 
Their joint Boltzmann factor is 

e -A-i*«-i-A*« = e - H (55) 
and the /3,_i <-> exchange leads to 

- Aff = {-pi-iE^ - PiE { S) - (-PiE^ ~ Pi-iEfll*) (56) 
= (ft-A-i) (e^-e^I) 

which is accepted or rejected according to the Metropolis algorithm, i.e., with 
probability one for AH < and with probability exp(— AH) for AH > 0. The 
/3i-values should to be determined so that an appropriate large acceptance rate 
is obtained for each f3 exchange. This can be done by recursions [3], which 
are straightforward modifications of a method introduced in Ref. |37j . 
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Finally, and perhaps most important: From about 1992 on applications of 
generalized ensemble methods diversified tremendously. This is documented 
in a number of reviews [351 HOI SI] ■ I n the next section we focus on the 
use of generalized ensembles in biophysics. 

4.3 Generalized ensembles and biophysics 

In Ref. [42] the multicanonical ensemble was first used for simulations of com- 
plex systems with frustrated interactions, in that case the Edwards- Anderson 
Ising spin glass. Multicanonical simulations of biologically relevant molecules 
followed [13j [14] , in Ref. [14] under the name "entropic sampling" , but this is 
just identical with multicanonical sampling [43j . 

The interactions defined by an energy function are frustrated if one can- 
not simultaneously align all variables favorably with respect to their mutual 
interactions. So one gets "frustrated" , a situation well known to political de- 
cision makers. In physical models frustrated interactions lead to a rugged free 
energy landscape. Canonical MCMC simulations tend to get stuck in the free 
energy barriers. In a nutshell, another ensemble may smoothen out such bar- 
riers, jump them, or at least allow to escape from them, for instance through a 
disordered phase. This can accelerate the over-all convergence of the MCMC 
process considerably. If all parameters are chosen right, reweighting will fi- 
nally allow to reconstruct canonical ensemble expectation values at desired 
temperatures. 

The parallel tempering (replica exchange) method was introduced in 
Ref. [33] to the simulation of biomolecules. In particular its extension to Molec- 
ular Dynamics [15] has subsequently been tremendously successful. Nowadays 
folding of small proteins is achieved using PC clusters and it appears that all 
these simulations rely on some form of the replica exchange method. 

4.4 Overcoming free energy barriers 

Basic mechanisms for overcoming energy barriers are best illustrated for first- 
order phase transitions. There one deals with the simplified situation of a 
single barrier, which is easily visualized by plotting histograms of an appro- 
priate observable. To give a first example, Fig. [4] shows for the 2d 10-state 
Potts model the canonical energy histogram at a pseudo-critical temperature 
versus the energy histogram of a multicanonical simulation on a 70 x 70 lattice 
[5J . The energy barrier is overcome by enhancing the probabilities for configu- 
rations, which are suppressed in the canonical ensemble due to an interfacial 
tension. 

The same barrier can be overcome by a parallel tempering simulation, but 
in a quite different way. Fig. [5j shows the histograms from a parallel tem- 
pering simulation with eight processes on 20 x 20 lattices. Each histogram 
corresponds to a fixed temperature, given by the (3 values in the figure. The 
@4 and (3$ values produce the clearest double peak histograms. For f}± the 
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Fig. 4. Multicanonical P m u{E) together with canonical P(E) energy distribution 
as obtained in Ref.[8] for the 2d 10-state Potts model on a 70 x 70 lattice. 
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Fig. 5. Energy histograms from a parallel tempering simulation with eight processes 
for the 2d 10-state Potts model on 20 x 20 lattices (assignment a0603_04 in [3]). 



higher peak is in the disordered region and for /3 5 it is in the ordered region. 
So the barrier can be "jumped" when there are at least two temperatures 
in the ensemble, which are sufficiently close to the particular pseudo-critical 
temperature for which the two peaks of the histogram assume equal heights 
(pseudocritical temperatures may also be defined by giving equal weights to 
the two parts of the histogram [46]). One of these two temperatures has to 
be in the ordered, the other in the disordered phase, and their start config- 
urations have to be in the corresponding phases. The barrier can be jumped 
by an exchange of these two temperatures. If the barrier is high enough, so 
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that a single canonical simulations does not cross it during the simulation 
time, the jumping process determines the relative height of the two barriers. 
Adding more close-by temperatures to the ensemble will increase to the accu- 
racy. Additional complications can occur, if a rare canonical tunneling process 
(crossing of the barrier) takes actually place. 

Let us compare with a multicanonical simulation. The multicanonical 
method flattens the barrier, whereas the parallel tempering simulation al- 
lows to jump it. Using larger lattices the multicanonical method is well suited 
for calculations of the interface tensions from energy histograms [5]. For par- 
allel tempering this is not the case, because the sampling of each histogram 
is still canonical. This can be overcome by a Gaussian variant of simulated 
tempering [47] , 



Fig. 6. Example configurations from a multicanonical simulation of poly-alanine [48] 
(courtesy Ulrich Hansmann and Yuko Okamoto). 

In complex systems with a rugged free energy landscape the barriers can no 
longer be explicitly controlled. Nevertheless it has turned out that switching to 
the discussed ensembles can greatly enhance the MCMC efficiency. A variety 
of biologically relevant applications are reviewed in Ref. [39],l4T| and in some of 
the lectures of this volume. Here we confine ourselves to showing a particularly 
nice example in Fig. [6j The folding of poly-alanine into its a-helix coil [48] . 
No a-priori information about the groundstate conformation is used in these 
kind of simulations. Whereas in a canonical simulation one would in essence 
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get stuck in a configuration like the second of the first row of this figure, the 
multicanonical simulation finds its way in and out of the helix structure. 
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Fig. 7. Free energy landscape of Met-Enkephalin at T = 250 K with respect to rms 
distances (A) from two reference configurations [ST]. 

Variations of the basic idea introduce weights in other variables than the 
energy (or even in several variables). For instance, weights in the magnetic 
field of spin systems were introduced quite some while ago [35]. For spin 
glasses weights in the Parisi overlap variable are used [5U]. The overlap of 
the configuration at hand with two reference configurations allows one to 
determine the transition state efficiently [5TJ . Fig. [7] shows a transition state 
located by this method in the free energy landscape of a simple peptide (Met- 
Enkephalin, which is discussed in the next section). In this figure contour lines 
are drawn every 2fcsT. The labels Ai and Bi indicate the positions for the 
local-minimum states that originate from the reference configuration 1 and 
the reference configuration 2, respectively. The label C stands for the saddle 
point that corresponds to the transition state. 
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4.5 Weight factor recursions 

For systems with rugged free energy landscapes FSS methods for estimating 
the weight factors are normally not available, because the systems are either of 
one particular size (e.g., biomolcculcs) or change too much when the system 
size is increased (e.g., spin glasses). In view of this one has to rely on ad- 
hoc per hand estimates or, more conveniently, on general purpose recursions. 
Designs for the latter were reported in a number of papers [52J [53J IM1 [55] • 

The recursions have in common that they intend to create weight factors 
which are inversely proportional to the spectral density of the system at hand 
(modifications to target other weights are straightforward). The Wang-Landau 
(WL) recursion [55J [55] differs fundamentally from the other approaches by 
iterating the weight factor at energy E multiplicatively instead of additivcly. 
At a first glance the WL approach is counter-intuitive, because the correct it- 
eration of the weight factor after a sufficiently long simulation is obviously 
proportional to one over the number of histogram entries H(E) and not 
l/{fwL) H[E) with fwL > 1. The advantage of the WL recursion is that 
it moves right away rapidly through the targeted energy range. Once this en- 
ergy range is sufficiently covered, the WL method refines its iteration factor 
by fwL —> V ' fwL-, so that it approaches 1. In this way the working approx- 
imations to the desired weight factors can be obtained on any finite energy 
range. In the following we give details for a variant [3J of the multicanonical 
recursion [52] (the modifications rely on work with W. Janke) and the WL 
recursion [55] , 

The multicanonical recursion uses the parameterization [3 [42] 

w (a) = e - s ^ = e -6(B.)B.+«CB.) 
of the weights, where (for e smallest stepsize) 

b(E) = [S(E + e) - S(E)] /e and 
a(E - e) = a(E) + [b(E - e) - b{E)\ E . 

After some algebra and other consideration [3J the recursion reads 

b n+1 (E) = b n (E)+g%(E) [\nH n {E + e)-hxH n {E)]/e, 
gS(E)=gZ(E)/lg n (E)+gZ(E)}, 
g'S(E) = H n (E + e) H n {E) / [H n (E + e) + H n (E)} , 

g n+1 (E)=g n (E)+gZ(E), g°(E) = 0. 

For continuous systems like biomolecules some modification are required, 
see [57] . 

For the WL recursion updates are performed with estimators p(E) of the 
density of states 

'p(Ei) 



p(Ei — > E2) = min 
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Each time an energy level is visited, they update the estimator 

p(E) -> p{E) f WL 

where, initially, p(E) — 1 and fwL — fa = e 1 . Once the desired energy range 
is covered, the factor Jwl is refined to 

/l = V / 7 1 fn+1 — V fn+l 

until some small value like Jwl = e~ 8 = 1.00000001 is reached. For Jwl 
very close to one the difference to a simulation with fixed weights becomes 
negligible, so that one may just keep on iterating Jwl —> V Jwl- However, it 
appears that such a continued iteration is far less efficient than switching to 
a simulation with fixed weights as soon as a working estimate is found. Sur- 
prisingly there appears to be only one comparative study [58] of the different 
recursions, which finds that overall performances are similar. 

5 Biased Markov Chain Monte Carlo 

Consider a random variable y which is sampled with a probability density func- 
tion (PDF) P{y) on an interval [2/1,2/2]- The cumulative distribution function 
(CDF) is defined by 

z = F(y) = [ V P(y')dy' and P{y) = (57) 

J vi d v 

where P(y) is properly normalized so that F (00) = 1 holds. 

The HBA generates y by converting a uniformly distributed random num- 
ber < z < 1 into 

y = F- 1 (z). (58) 

As the acceptance rate is defined by the number of accepted changes divided 
by the total number of proposed moves, the acceptance rate of the HBA is 
always 1 (a new value of y is generated on every step). In simulations the 
inversion of the CDF (|Ff|) may be unacceptably slow or the CDF itself may 
not be a priori known. Then one has to rely on other approaches. 

In the conventional Metropolis scheme y new is generated uniformly in a 
range [2/1,2/2] (we refer to this as proposal) and accepted with probability 
(accept/reject step) 

P{Unew) \ /rn\ 
P{Vold) J 

This process can have a low acceptance rate in the region of interest. Possible 
remedies are to decrease the proposal range, which makes the moves small, 
or use of multi-hit Metropolis. Excluding CPU time considerations for the 
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moment, measured by the integrated autocorrelation time both remedies are 
normally less efficient than a HBA. 

Hastings |59j identified proposal probabilities, which are more general than 
those of the conventional Metropolis scheme, but gave no guidance whether 
some probabilities may be preferable over others. If one does not propose y new 
uniformly anymore, the name Biased Metropolis Algorithm (BMA) is 
often used. Some biased Metropolis simulations can be found in the literature 
where the bias is introduced to match special situations [BUI [S3 ISH [Ml H3] • 
In the following we discuss a general biased Metropolis scheme [T51 [B5] > which 
aims at approximating hcatbath probabilities. 

Let us discretize y into n bins as 

yi =y° <y x <y 2 < ... < y n = y 2 (60) 
where the lengths of the bins are 

Ay 3 = y 3 — y 3 , with j — 1, ...,n. (61) 

A BMA can then be defined by the following steps: 

• Propose a new value y new by randomly picking a bin j new and then propos- 
ing y ne w uniformly in the given bin. Two uniform random numbers ri, r 2 
are needed: 

jnew = 1 + Int[nri] and y new = y 3 ™'™- 1 + Ay 3 """ r 2 , (62) 

where Int[nri] denotes rounding to the largest integer < nr\. 

• Locate the bin j Q id to which y id belongs: 

y joU-l <y oU <yioU. ( 6 3) 

• Accept y new with probability: 

. f 1 P{yne W ) A^— ) 

PBMA = mm \ h Ti^) A^r/ (64) 

Pbma in ([6"4")> differs frompMet hi ([55]) by the bias Ay Jnam /Ay Jotd . The scheme 
outlined in ([62| - (|64)l satisfies balance or detailed balance in the same way as 
the original Metropolis algorithm, while the bias changes the acceptance rate. 

So far the partitioning y 3 has not been introduced explicitly. A particular 
choice is: 

3 - = F(y 3 ) or y 3 = F~ x ( ^ ) . (65) 
n \n J 

This equation achieves equidistant partitioning on the CDF ordinate. Let us 
pick a bin initially labeled j and take the limit n — > oo so that this bin 
collapses into a point labeled z. Then this BMA approaches the HBA and the 
acceptance rate converges to 1: 



Markov-Chain Monte Carlo Methods for Simulations of Biomolecules 



27 



P{Vnew) Ay J " e 

P{y old ) Ayioi. 



P(yue W ) l/P(y lie w ) 

PiVold) l/P{Vold) 



= l. 



(66) 



Therefore we call a BMA with the partitioning ([6"5)) Biased Metropolis- 
Heatbath Algorithm (BMA). Restricted to the updating of one dynamical 
variable the improvements are similar to those seen in Fig. [3] where the gain 
is a factor of five. Having in mind that that realistic simulations take usually 
months of computer time, such factors are highly welcome. Extending the 
biased method to simultaneous updating of more than one variable, larger 
improvement factors can be anticipated. But extensions to more than two 
variables face technical difficulties. 

5.1 Application to a continuous model 

Following Ref. [65] we illustrate the BMHA for a system with a continuous 
energy function: Pure lattice gauge theory calculations with the [7(1 ) gauge 
group for which the "matrices" are complex numbers on the unit circle, pa- 
rameterized by an angle tfi € [0, 2tt). After defining the theory on the links of 
a four-dimensional lattice and going through some algebra, the PDF 



has to be sampled, where a is a parameter associated to the interaction of the 
link being updated with its environment. The corresponding CDF is 



where N a ensures the normalization F a (2ir) = 1. In the following we consider 
U(l) gauge theory at a coupling close to the critical point for which one finds 
< a < 6. 

Let us discretize the parameter a into m = 2™ 1 = 16 (ni = 4) bins, 
choosing equidistant partitioning. In each a 1 bin we discretize <j) using the 
condition (|6"5)) with n = 2" 2 = 16 (ri2 = 4). Two two-dimensional arrays are 
needed: one for storing J (levels themselves) and another for A0 JJ — (ft 1 J — 
02 ,j- 1 (distances between levels), see Fig. [8] For a given a 1 it is straightforward 
to apply BMA step ([62]) . E.g., for a = a 11 , the cross section of the F a {4>) 
surface plane is then shown in Fig. [9] To determine the bin label joid which 
belongs to the (known) value 4>o,oid (BMA step ([Ml)) °ne may use the n 2 -step 
recursion j — » j + 2* 2 sign ( (f> — ), %<i —* i% — 1. Once j id is known it 
gives the length of the bin: A(/> IJo!d and the final accept/reject step ([64| can 
be applied: 



P a (<f>) = N a e' 



Q COS(0) 



(67) 




(68) 



(69) 
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Fig. 8. m x n partitioning of Fig. 9. Discretization of the CDF 

for U(l) at the coupling constant value F a ii(4>) for £7(1) corresponding to the 
discussed in the text. 11th bin of Fig. [S] 



In this example the CDF is known. We have shown that sampling with 
the BMHA is essentially equivalent to using the HBA, but can be nu- 
merically faster, as is the case for £7(1). SU(2) lattice gauge theory with 
the fundamental-adjoint action allows for substantial speed-ups by using a 
BMHA [66j . In the next section we show how a similar biasing procedure can 
be used when the CDF is not known (making a HBA impossible) and extend 
it to two variables. 

5.2 Rugged Metropolis, a biasing scheme for biophysics 

We consider biomolecule models for which the energy E is a function of 
a number of dynamical variables Vi, i — l,...,n. The fluctuations in the 
Gibbs canonical ensemble are described by a probability density function 
p(v%, . . . , v n ; T) — const exp[— f3E(vi, . . . , v n )], where T is the temperature, 
f3 = l/(fcT), and E is the energy of the system. To be consistent with the 
notation of [T3] we now use p[v\, ■ ■ ■ , v n ; T) instead of P(y) introduced in 
previous one-variable example. Proposing a new variable (with the other 
variables fixed) from the PDF constitutes a HBA. However, an implemen- 
tation of a HBA is only possible when the CDF of the PDF can be con- 
trolled. In particular this requires the normalization constant in front of the 
exp[— (3 E(v\, . . . ,v n )] Boltzmann factor, which is normally unknown. Then 
the following strategy can provide a useful approximation. 
For a range of temperatures 

Ti > T 2 > ... > T r > ... > Tf-i > T f (70) 

the simulation at the highest temperature, Ti, is performed with the usual 
Metropolis algorithm and the results are used to construct an estimator 



p(vi, . . .,v n ;Ti) 
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which is used to bias the simulation at T 2 . Recursively, the estimated PDF 

/5(«i, . . . ,u n ;T r _i) 

is expected to be a useful approximation of p{v\, . . . , v n ; T r ). Formally this 
means that BMA acceptance step (|64l) at temperature T r is of the form 



RM 



1 exp (-/? E') p(v!, . . .,v n \T r -i, . ^ 7i ^ 
' exp(-/3E) p(v' 1 ,...,v' n ;T r ^ 1 



where /3 = l/(kT). For this type of BMA, where the bias is constructed by 
using information from a higher temperature, the name Rugged Metropolis 
(RM) was given in [T5] . 

Our test case in the following will is the small brain peptide Met- 
Enkephalin (Tyr-Gly-Gly-Phe-Met) in vacuum, which features 24 dihedral an- 
gels as dynamical variables. Its global energy minimum was determined some 
time ago by Li and Scheraga [57]. Ever since this molecule is often used as a 
simple laboratory for testing new computational methods. We rely on the all- 
atom energy function ECEPP/2 (Empirical Conformational Energy Program 
for Peptides) [B5j as implemented in the SMMP (Simple Molecular Mechanics 
for Proteins) [69j program package. Besides the <fr, -0 angles, we keep also the 
u angles unconstrained, which are usually restricted to [w — ir/9,ir + 7r/9]. 
This allows us to illustrate the RM idea for a particularly simple case. 

To get things started, we need to construct an estimator p(vi, ... ,v n ; T r ) 
from the numerical data of the RM simulation at temperature T r . This is 
neither simple nor straightforward, and approximations have to be used. 



The RMi approximation 

In Ref. [15] the factorization 

n 

p{vi, ■ ■ ■ ,v n \T r ) = \\'p]{v l ]T r ) (72) 
i=i 

was investigated, where pj (vf, T r ) are estimators of reduced one- variable PDFs 
defined by 




The resulting algorithm, called RMi, constitutes the simplest RM scheme 
possible. The CDFs are defined by 

Fi(v) = f dv'pKv') . (74) 

J — 7T 

The estimate of Fio, the CDF for the dihedral angle Gly-3 <j> (i>io), from the 
simulations at our highest temperature, T\ — 400 K, is shown in Fig. [TO] For 
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Fig. 10. Estimate of the cumula- 
tive distribution function for the Met- 
Enkephalin dihedral angle Wio (Gly-3 (j>) 
at 400 K. 
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Fig. 11. Estimate of the cumula- 
tive distribution function for the Met- 
Enkephalin dihedral angle 119 (Gly-2 uo) 
at 400 K. 



our plots we use degrees, while we use radiant in our theoretical discussions 
and in the computer programs. Fig. [10] is obtained by sorting all ndat values 
of v\o in our time series in ascending order and increasing the values of F\q 
by 1/ridat whenever a measured value of Vio is encountered. Using a heapsort 
approach [3], the sorting is done in ridat log 2 (rid a t) steps. 

Figure [Tl] shows the CDF for v 9 (Gly-2 id) at 400 K, which is the angle 
of lowest acceptance rate in the conventional Metropolis updating. This dis- 
tribution function corresponds to a histogram narrowly peaked around ±7r, 
which is explained by the specific electronic hybridization of the CO-N pep- 
tide bond. From the grid shown in Fig. QT] it is seen that the RMi updating 
concentrates the proposal for this angle in the range slightly above — 7r and 
slightly below +ir. Thus the procedure has automatically a similar effect as 
the often used restriction to the range [it — 7r/9, it + tt/9], which is also the 
default implementation in SMMP. 

After the empirical CDFs are constructed for each angle Vi, they are dis- 
cretized using the condition (|65|) . Here we denote differences (IFTj) needed for 
the bias by 

^ v i,j — v i,j ~ v i,j-l with Vifl — —ir . (75) 

The RMi updating of each dihedral angle Vi follows then the BMA procedure 
(|rj2")) - ([M]) . The accept/reject step in the Vij notation is 

. / exp(-0E') Av h3 _ 

prmi =nm 1, — nr^v — 

L exp(-(3E) Av iijald 

The RM2 approximation 

In Ref. [70] the RMi scheme of Eq. ([75]) was generalized to the simultaneous 
updating of two dihedral angles. For i\ ^ 12 reduced two- variable PDFs are 
defined by 
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/ + 7T ^ 
II dvj p(vj,...,v n ]T) . (77) 

The one- variable CDFs and the discretization j, j = 0, . . . ,n are al- 
ready given by Eqs. (|T4[) and (|75|) . We define conditional CDFs by 

Fh,i2-j( v )= dv *2 dv h p\ , i2 (uii,^J (78) 

for which the normalization F ili2 .j(ir) = 1/n holds. To extend the RM X up- 
dating to two variables we define for each integer fc = l,...,n the value 
F lu i 2 - M = k/n 2 . Next we define Vi ltir j t k through F lui2 .^ k = F lu i 2 . ] (v lu i 2 - j ^) 
and also the differences 

& v itM3,k = «»i,*2u,fc ~ «»i,«2y,fe-i witn v iiMj,0 = _7r ■ ( 79 ) 

The RM2 procedure for the simultaneous update of {v^^v^) is then specified 
as follows: 

• Propose a new value Vi lyTlew using two uniform random numbers 7*1, r 2 
(BMA step (J62J) for the angle ii): 

inew = [nn] and Vi I)f , e u> = Viij naw -i + Av iltjnew r 2 . (80) 

• PrOpOSG cl new VcLllie Vi 2 ,new 

using two uniform random numbers r 3 , r± 
(BMA step fl52J) for the angle i 2 ): 

knew = [nrs] and w l2i „ e «) = v h,i 2 ;j n ^,k ncw -i + Av iui2; j nem ^ new r i- (81) 

• Find the bin index j a id for the present angle Vi li0 M through iVj old -i — 
ViLoid < v i 1: j i d , just like for RMi updating (BMA step for VjJ. 

• Find the bin index k id for the present angle «i 2;0 zd through Vi 1 ^ 2 -j oldi k old -i < 
Vi 2 ,oid < v ii,i 2 ;joid,kaid (again step (J63J) but for v l2 ). 

• Accept {vi linew , Vi 2>new ) with the probability 

exp(-ffg') Av ilJncu , Av n ^ Jn _ k _ \ 
exp(-/3£) Av iujoli Av iu i 2 - joldt k old J 

As for RMi, estimates of the conditional CDFs and the intervals Av^^-j^ 
are obtained from the conventional Metropolis simulation at 400 K. In the fol- 
lowing we focus on the pairs (vT,Vg), (uio, t>ii) and (vi5,vie). These angles 
correspond to the largest integrated autocorrelation times of the RMi pro- 
cedure and are expected to be strongly correlated with one another because 
they are pairs of dihedral angles around a C a atom. 

The bias of the acceptance probability given in Eq. is governed by 
the areas 

AAi 1 ^ 2 .j k — Avi 1 j Avi lt i 3 .j t k ■ 



Prm 2 
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Fig. 12. Areas of equal probabilities Fig. 13. Areas of equal probabilities 
(sorting Vf then v%). (sorting Vio then t)u). 



For i% = 7 and — 8 our 400 K estimates of these areas are depicted in 
Fig. [T2j For the RM2 procedure these areas take the role which the intervals 
on the abscissa of Fig. [10] play for RMi updating. The small and the large 
areas are proposed with equal probabilities, so the a-priori probability for 
our two angles is high in a small area and low in a large area. Areas of high 
probability correspond to allowed regions in the Ramachandran map of a Gly 
residue [71] . In Fig. [T^] the largest area is 503.4 times the smallest area. Note 
that the order in which the angles are sorted introduces some differences. 
Fig. [TBI gives a plot for («i5,Wi6) pairs in which the angle with the smaller 
subscript is sorted first. The ratio of the largest area over the smallest area 
is 2565.8. The large number is related to the fact that (wi5,«i6) is the pair 
of (j), ip angles around the C a atom of Phe-4, for which positive <fi values are 
disallowed [TTj . 

Reductions in the integrated autocorrelations times of the angles vary and 
are again similar to those ovserved in Fig. [3] when moving from the ordinary 
Metropolis to the HBA. 

6 Outlook and Conclusions 

Spin systems, lattice gauge theory and biophysics models are certainly far 
apart in their scientific objectives. Nevertheless quite similar computational 
techniques allow for efficient MCMC simulations in either field. Cross-fertilization 
may go in both directions. For instance, generalized ensemble techniques prop- 
agated from lattice gauge theory [7] over statistical physics [42] into biophysics 
[j"3] , while it appears that biased Metropolis techniques [60] [61j [62J. [63] HI] 
propagate in the opposite direction [55] . 

It is rather straightforward to combine the biased techniques of section [SJ 
with generalized ensembles, but work in this direction has just begun. In 
Ref. [J5] and [75] combinations with parallel tempering have been studied and 
the improvements were approximately multiplicative. An implementation into 
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the multicanonical ensemble has recently been worked out [73j and applied 
in a study of the deconfining phase transition of U(l) lattice gauge theory. 
Extension of the rugged MC approach to MD are also possible |74j . As this 
leads into ongoing research, it is a good point to conclude these lecture notes 
at this point. 
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